Skip to content

test(jepsen): add ZSet safety workload with model-based checker#550

Open
bootjp wants to merge 32 commits intomainfrom
feat/jepsen-zset-safety
Open

test(jepsen): add ZSet safety workload with model-based checker#550
bootjp wants to merge 32 commits intomainfrom
feat/jepsen-zset-safety

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 19, 2026

Summary

Adds a Jepsen workload (elastickv-zset-safety-test) that verifies ZSet-specific safety properties under faults (network partitions, node kills), using a custom model-based checker. Goes beyond the simple add->read visibility check.

Properties verified:

  • Score correctness: a ZRANGE result's score for a member must equal the model's latest committed score, OR equal a score written by an operation concurrent with the read.
  • Order preservation: ZRANGE 0 -1 must be sorted by (score asc, member lex asc).
  • ZRANGEBYSCORE correctness: bounded range queries return exactly the members whose score falls in the bound, modulo concurrent mutations.
  • No phantom members: every read member must have been introduced by some successful or in-flight ZADD/ZINCRBY.
  • Atomicity: the checker treats every :ok operation as atomic; any visible inconsistency is reported.

Concurrent-ZADD handling uses an invoke/complete windowing approach. A mutation is "committed before" a read iff its :complete index is strictly less than the read's :invoke index. Mutations whose intervals overlap are "concurrent" and contribute to a per-member allowed-score set. Indeterminate (:info) mutations are treated as possibly-concurrent. ZINCRBY whose response is unknown sets :unknown-score? so the checker skips the strict score check for concurrent reads. ZREM carries the actual removed? boolean from the server reply so a no-op ZREM does not falsely mark the member as deleted.

Files

  • New: jepsen/src/elastickv/redis_zset_safety_workload.clj
  • New: jepsen/test/elastickv/redis_zset_safety_workload_test.clj (test-spec construction + checker edge cases: no-op ZREM, :info ZINCRBY, deterministic score mismatch).
  • Modified: jepsen/src/elastickv/jepsen_test.clj (entry point added).
  • Modified: .github/workflows/jepsen-test.yml (5s smoke run on every push).
  • Modified: .github/workflows/jepsen-test-scheduled.yml (150s default run every 6h).

Running locally

The workload is invoked directly via its own -main, not through jepsen-test/-main:

cd jepsen
lein run -m elastickv.redis-zset-safety-workload \
  --time-limit 60 --rate 10 --concurrency 5 \
  --ports 63791,63792,63793 --host 127.0.0.1

(elastickv.jepsen-test exposes elastickv-zset-safety-test only as a Clojure function for REPL use; CI and ad-hoc runs use the namespace's own -main.

Test plan

  • passes (Java 21).
  • Run with partition + kill faults: confirm checker emits a clear failure when a stale-leader read returns a divergent score.
    EOF
    )

Summary by CodeRabbit

  • Tests
    • Added a new safety test suite for Redis ZSet operations, validating data consistency and correctness under concurrent operations and fault conditions.
    • Extended CI/CD pipeline to automatically run ZSet safety tests in both scheduled and on-demand test workflows.

Adds a Jepsen workload that goes beyond add->read visibility and verifies
ZSet-specific safety properties under faults (network partitions, node
kills):

- score correctness: a ZRANGE result's score for a member must equal the
  model's latest committed score for that member, OR equal a score
  written by an operation that is concurrent with the read (since the
  client cannot linearise concurrent writes to the same member).
- order preservation: ZRANGE 0 -1 must be sorted by (score asc, member
  lex asc).
- ZRANGEBYSCORE correctness: bounded range queries return exactly the
  members whose score falls in the bound, modulo concurrent mutations.
- no phantom members: every read member must have been introduced by some
  successful or in-flight ZADD/ZINCRBY.

Concurrent-ZADD handling uses an invoke/complete windowing approach:
mutations whose complete index < read's invoke index are committed
before the read; mutations whose intervals overlap are concurrent and
contribute to the per-member allowed-score set. Indeterminate (:info)
mutations are treated as possibly-concurrent.

Workload entry point added to jepsen_test.clj as elastickv-zset-safety-test.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 47 minutes and 30 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 47 minutes and 30 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8a1bfcf-8ed4-4921-929f-513e70b96911

📥 Commits

Reviewing files that changed from the base of the PR and between 9bfcc13 and 29e62ca.

📒 Files selected for processing (5)
  • .github/workflows/jepsen-test-scheduled.yml
  • .github/workflows/jepsen-test.yml
  • jepsen/src/elastickv/jepsen_test.clj
  • jepsen/src/elastickv/redis_zset_safety_workload.clj
  • jepsen/test/elastickv/redis_zset_safety_workload_test.clj
📝 Walkthrough

Walkthrough

This pull request introduces a new Redis ZSet safety Jepsen workload (elastickv.redis-zset-safety-workload) to test elastickv's Redis implementation under faults. It includes CI integration in two scheduled/on-demand workflows, the workload implementation with a model-based checker, and comprehensive unit tests validating checker behavior.

Changes

Cohort / File(s) Summary
CI Workflow Integration
.github/workflows/jepsen-test-scheduled.yml, .github/workflows/jepsen-test.yml
Added new CI steps to run the Redis ZSet safety workload with timeout controls, rate limiting, and port/host configuration alongside existing Redis workload tests.
Workload Registration
jepsen/src/elastickv/jepsen_test.clj
Added namespace require for elastickv.redis-zset-safety-workload and exposed a new elastickv-zset-safety-test function to integrate the ZSet safety workload into the CLI.
Workload Implementation
jepsen/src/elastickv/redis_zset_safety_workload.clj
Implemented new Jepsen workload namespace with client operations (:zadd, :zincrby, :zrem, :zrange-all, :zrangebyscore), a model-based checker validating ordering/completeness/score consistency, fault configuration, and CLI option parsing.
Test Coverage
jepsen/test/elastickv/redis_zset_safety_workload_test.clj
Added unit tests for workload construction, option overrides, and checker validation logic across edge cases (no-op ZREM, uncertain scores from :info operations, score mismatches).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Redis
    participant Nemesis as Nemesis (Fault Injection)
    participant Checker

    Client->>Redis: ZADD/ZINCRBY/ZREM/ZRANGE operation
    Redis-->>Client: Response (or timeout/error)
    Note over Client: Record invoke/complete<br/>with type (:ok/:info)
    
    Nemesis->>Redis: Inject faults (partition/stop/etc.)
    Redis-->>Client: Degraded/failed responses
    
    Nemesis->>Redis: Heal/restore
    Redis-->>Client: Recover operations
    
    Checker->>Checker: Extract mutation windows<br/>per operation
    Checker->>Checker: Build per-member<br/>committed state
    Checker->>Checker: Validate ordering<br/>(score ↑, then member ↑)
    Checker->>Checker: Check score correctness<br/>vs committed/concurrent
    Checker->>Checker: Enforce completeness<br/>(no phantom members)
    Checker-->>Checker: Produce :valid?
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A hop through the ZSet maze!

New safety workload hops in place,
Checking scores with fuzzy grace,
When faults crash through, the checker knows,
Each member's rank and all it owes—
Redis ZSets now stand the test! 🎯

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(jepsen): add ZSet safety workload with model-based checker' accurately summarizes the main change: introducing a new Jepsen test workload for ZSet safety with a model-based checker, which is reflected across all modified and added files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/jepsen-zset-safety

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- jepsen-test.yml: 5s smoke run on every push, mirroring the other
  workloads.
- jepsen-test-scheduled.yml: 150s default run (overridable via
  workflow_dispatch inputs) every 6 hours.

Workload entry: elastickv.redis-zset-safety-workload (added in the
previous commit via -main).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Jepsen workload to validate elastickv’s Redis ZSet safety properties (scores, ordering, range correctness, and phantom detection) under faults using a custom model-based checker, and wires a helper entrypoint into the Jepsen test namespace.

Changes:

  • Introduces redis_zset_safety_workload.clj with a Carmine-based client, randomized op generator, and a custom checker for ZSet safety properties.
  • Adds a new wrapper function in jepsen_test.clj intended to expose the new workload.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
jepsen/src/elastickv/redis_zset_safety_workload.clj New ZSet safety Jepsen workload with custom model/checker and CLI entrypoint.
jepsen/src/elastickv/jepsen_test.clj Adds a wrapper function and require for the new ZSet safety test.

Comment on lines +476 to +526
(defn elastickv-zset-safety-test
"Builds a Jepsen test map that drives elastickv's Redis ZSet safety
workload."
([] (elastickv-zset-safety-test {}))
([opts]
(let [nodes (or (:nodes opts) default-nodes)
redis-ports (or (:redis-ports opts)
(repeat (count nodes) (or (:redis-port opts) 6379)))
node->port (or (:node->port opts)
(cli/ports->node-map redis-ports nodes))
local? (:local opts)
db (if local?
jdb/noop
(ekdb/db {:grpc-port (or (:grpc-port opts) 50051)
:redis-port node->port
:raft-groups (:raft-groups opts)
:shard-ranges (:shard-ranges opts)}))
rate (double (or (:rate opts) 10))
time-limit (or (:time-limit opts) 60)
faults (if local?
[]
(cli/normalize-faults (or (:faults opts) [:partition :kill])))
nemesis-p (when-not local?
(combined/nemesis-package {:db db
:faults faults
:interval (or (:fault-interval opts) 40)}))
nemesis-gen (if nemesis-p
(:generator nemesis-p)
(gen/once {:type :info :f :noop}))
workload (elastickv-zset-safety-workload
(assoc opts :node->port node->port))]
(merge workload
{:name (or (:name opts) "elastickv-redis-zset-safety")
:nodes nodes
:db db
:redis-host (:redis-host opts)
:os (if local? os/noop debian/os)
:net (if local? net/noop net/iptables)
:ssh (merge {:username "vagrant"
:private-key-path "/home/vagrant/.ssh/id_rsa"
:strict-host-key-checking false}
(when local? {:dummy true})
(:ssh opts))
:remote control/ssh
:nemesis (if nemesis-p (:nemesis nemesis-p) nemesis/noop)
:final-generator nil
:concurrency (or (:concurrency opts) 5)
:generator (->> (:generator workload)
(gen/nemesis nemesis-gen)
(gen/stagger (/ rate))
(gen/time-limit time-limit))}))))
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. Unit tests for this workload were added in commit ecb3983 under jepsen/test/elastickv/redis_zset_safety_workload_test.clj — they cover test-spec construction (builds-test-spec, custom-options-override-defaults) plus checker edge cases: no-op ZREM, :info ZINCRBY, phantoms, stale reads, superseded scores, concurrent ZADD/ZREM, :info ZREM absence, :info ZINCRBY completeness, infinity parsing, duplicate-member flagging. Latest commit 0c0efc4 adds two more for the single-ok / multi-ZINCRBY score check.

Comment on lines +251 to +254
:zrem
(let [m (:value invoke)]
{:f :zrem :member m :score nil :zrem? true
:type t :invoke-idx inv-idx :complete-idx cmp-idx})))))
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. ZREM now carries :removed? from the server reply into the mutation record (see completed-mutation-window lines 283-295 and apply-mutation-to-state/existence-evidence?). A no-op ZREM (returns 0) does NOT mutate model state, and :info/:pending ZREMs are treated as possibly-concurrent deletions via the concurrent window. Regression test: noop-zrem-does-not-flag-correct-read and no-op-zrem-alone-does-not-false-positive. Landed in 2a194a4 / ecb3983.

Comment on lines +244 to +249
:zincrby
(let [[m _delta] (:value invoke)
s (when (and (= :ok t) (vector? (:value complete)))
(second (:value complete)))]
{:f :zincrby :member m :score (some-> s double)
:type t :invoke-idx inv-idx :complete-idx cmp-idx})
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. completed-mutation-window sets :unknown-score? on each :info/:pending ZINCRBY (line 277-279), and allowed-scores-for-member propagates that to the caller so check-zrange-all/check-zrangebyscore skip strict score validation when the resulting score is unknown. Regression test: info-zincrby-skips-strict-score-check, info-before-read-is-considered-uncertain. Landed in 2a194a4 / ecb3983, refined in 0c0efc4 per Codex P1.

Comment on lines +271 to +274
strictly precede `read-inv-idx`. Model maps member -> {:score s} or
marks member as :deleted. Returns {:members map :ok-members set}.
Only considers :ok mutations for the authoritative model; :info
mutations are treated as uncertain (neither strictly applied nor not)."
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. Docstring rewritten on line 323-327: "Construct authoritative per-member state from mutations whose completions strictly precede read-inv-idx. Returns {member -> {:present? bool :score s}}". The {:members map :ok-members set} wording no longer appears in the file.

Comment thread jepsen/src/elastickv/jepsen_test.clj Outdated
Comment on lines 18 to 23
(defn elastickv-zset-safety-test []
(zset-safety-workload/elastickv-zset-safety-test {}))

(defn -main
[& args]
(cli/run! (cli/single-test-cmd {:test-fn elastickv-test}) args))
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. jepsen_test.clj now dispatches on the first positional arg via the test-fns map (lines 21-29), auto-prepending test if the user omitted the jepsen subcommand. elastickv-zset-safety-test is reachable via lein run -m elastickv.jepsen-test elastickv-zset-safety-test .... The workload namespace's own -main also works for direct invocation.

Three Copilot findings on PR #550:

1. :zincrby indeterminate handling. Pending or :info ZINCRBY left the
   resulting score unknown, but the checker still required the
   observed read score to be in the finite allowed-scores set. A read
   that legitimately observed an in-flight increment was flagged as a
   score mismatch (false positive).

   completed-mutation-window now sets :unknown-score? on a ZINCRBY when
   the completion is :info or pending. allowed-scores-for-member
   returns :unknown-score? when any concurrent ZINCRBY carries the
   flag, and check-zrange-all / check-zrangebyscore skip the strict
   score-membership check in that case.

2. :zrem no-op handling. ZREM of a never-added member returns 0
   server-side (no-op). The previous model treated every ZREM as a
   deletion, producing missing-member false positives and
   score-mismatch false negatives.

   invoke! already exposes the actual removed? boolean as the second
   element of the completion value. completed-mutation-window now
   threads :removed? through, and the new apply-mutation-to-state
   helper leaves state unchanged when :removed? is false.

3. model-before docstring claimed it returned {:members map :ok-members
   set}, but it returned the model map directly. Docstring rewritten to
   match the actual return value.

Adds jepsen/test/elastickv/redis_zset_safety_workload_test.clj
covering test-spec construction, the no-op ZREM edge case, the :info
ZINCRBY skip, and a positive-control score-mismatch detection. The
checker tests bypass the timeline.html sub-checker (which writes to
the test store) by invoking zset-safety-checker directly.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 19, 2026

94be1bd で Copilot 指摘を fix:

  • :zincrby indeterminate false positive: pending/:info ZINCRBY の resulting score 不明分を :unknown-score? フラグで伝播。allowed-scores-for-member が concurrent に unknown ZINCRBY を含む場合、strict score check を skip。
  • :zrem no-op false positive: invoke! が既に返している removed? boolean を mutation record に thread。:removed? false の ZREM は state を変更しない (apply-mutation-to-state ヘルパーで分岐)。
  • model-before docstring: 実装に合わせて修正。
  • PR description: lein run -m elastickv.redis-zset-safety-workload 直接呼び出し方式を明記 (CI と整合)。
  • Workload unit test: jepsen/test/elastickv/redis_zset_safety_workload_test.clj 追加。test-spec 構築、no-op ZREM, :info ZINCRBY, positive-control score mismatch を deterministic に検証。timeline.html サブチェッカーは store を要求するので bypass し、zset-safety-checker を直接呼び出し。

lein test elastickv.redis-zset-safety-workload-test 全 5 件 PASS (Java 21 環境で確認)。

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…re-mismatch

CI failure on 94be1bd surfaced a remaining checker false positive: a
read that observes a member whose only prior ops are no-op ZREMs was
classified as :score-mismatch with :allowed #{} rather than treated as
a never-existed member. allowed-scores-for-member returned
any-known? = true because (seq concurrent) was truthy even when every
concurrent op was a :zrem that didn't actually remove anything.

Fix: any-known? (renamed internally to existence-evidence?) now only
counts concurrent mutations that provide evidence the member ever
existed -- :zadd, :zincrby, or a :zrem whose :removed? boolean is true.
A concurrent no-op ZREM contributes nothing.

Adds no-op-zrem-alone-does-not-false-positive as a regression test.
All 6 workload unit tests pass under Java 21.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 20, 2026

9bfcc13 で CI 失敗原因の checker false positive を追加 fix:

  • no-op ZREM のみで score-mismatch 誤検出: CI 結果 :allowed #{} の score-mismatch が発生。any-known?(seq concurrent) truthy を返していたため、concurrent が全部 no-op ZREM でも score 判定パスに入っていた。
  • 修正: existence-evidence? に名前を改め、concurrent mutation のうち そのメンバが存在したことの証拠 だけカウント (:zadd / :zincrby / :removed? true:zrem)。no-op ZREM は存在証拠にならない。
  • regression test 追加 (no-op-zrem-alone-does-not-false-positive)。unit test 計 6 件 PASS。

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
jepsen/test/elastickv/redis_zset_safety_workload_test.clj (1)

42-89: Add a direct :zrangebyscore checker regression.

The checker has a separate bounded-range path, but these edge-case tests only exercise :zrange-all. A small missing-member or out-of-range regression would protect the advertised ZRANGEBYSCORE property.

Example test to cover bounded-range completeness
+(deftest zrangebyscore-missing-member-is-detected
+  (let [history [{:type :invoke :process 0 :f :zadd :value ["m1" 5] :index 0}
+                 {:type :ok     :process 0 :f :zadd :value ["m1" 5] :index 1}
+                 {:type :invoke :process 0 :f :zrangebyscore :value [0 10] :index 2}
+                 {:type :ok     :process 0 :f :zrangebyscore
+                  :value {:bounds [0 10] :members []}
+                  :index 3}]
+        result  (run-checker history)]
+    (is (not (:valid? result)) (str "expected range mismatch, got: " result))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jepsen/test/elastickv/redis_zset_safety_workload_test.clj` around lines 42 -
89, Add a new test that exercises the bounded-range code path by invoking
:zrangebyscore instead of :zrange-all so the checker’s ZRANGEBYSCORE logic is
covered; create a test (e.g. noop-zrem-does-not-flag-bounded-zrangebyscore or
similar) that mirrors one of the existing edge-case histories but uses
:zrangebyscore (with appropriate score bounds/values) and asserts run-checker
returns :valid? or not as expected, ensuring you reference the same run-checker
invocation and operation symbols (:zadd, :zrem, :zincrby, :zrangebyscore) so the
new test hits the bounded-range branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@jepsen/src/elastickv/redis_zset_safety_workload.clj`:
- Around line 386-416: The read result must be checked for duplicate members to
prevent a ZSet read returning the same member twice; update check-zrange-all
(and the similar function handling ranges around lines 430-464) to detect
duplicate member entries before per-entry validation by scanning entries and
maintaining a seen-members set, and if a member is already seen swap! the errors
atom with a {:kind :duplicate :index cmp-idx :member member :entries entries}
(or similar) error entry so duplicate-member reads are reported and rejected.
- Around line 313-318: The current logic builds applied as (->> muts (filter ...
(< (:complete-idx %) read-inv-idx)) (sort-by :complete-idx)) and then reduces
via apply-mutation-to-state, which incorrectly linearizes overlapping :ok
mutations by completion time; change this to detect overlapping committed
mutations (using their :invoke-idx and :complete-idx intervals from muts) and do
not sort/serialize ambiguous pairs by :complete-idx. Instead compute either the
set of all possible latest states for the read by merging non-overlapping
mutations deterministically and treating overlapping/conflicting writes
conservatively (e.g., allow values from any write whose interval is not ordered
before the read), and update the applied construction and reduction via
apply-mutation-to-state to use that conservative/possible-states approach; apply
the same fix at the other occurrence around lines 347-352.

---

Nitpick comments:
In `@jepsen/test/elastickv/redis_zset_safety_workload_test.clj`:
- Around line 42-89: Add a new test that exercises the bounded-range code path
by invoking :zrangebyscore instead of :zrange-all so the checker’s ZRANGEBYSCORE
logic is covered; create a test (e.g.
noop-zrem-does-not-flag-bounded-zrangebyscore or similar) that mirrors one of
the existing edge-case histories but uses :zrangebyscore (with appropriate score
bounds/values) and asserts run-checker returns :valid? or not as expected,
ensuring you reference the same run-checker invocation and operation symbols
(:zadd, :zrem, :zincrby, :zrangebyscore) so the new test hits the bounded-range
branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e12c0d39-43a2-4d57-a8a3-f49ade75baba

📥 Commits

Reviewing files that changed from the base of the PR and between 173fbbc and 9bfcc13.

📒 Files selected for processing (5)
  • .github/workflows/jepsen-test-scheduled.yml
  • .github/workflows/jepsen-test.yml
  • jepsen/src/elastickv/jepsen_test.clj
  • jepsen/src/elastickv/redis_zset_safety_workload.clj
  • jepsen/test/elastickv/redis_zset_safety_workload_test.clj

Comment thread jepsen/src/elastickv/redis_zset_safety_workload.clj Outdated
Comment thread jepsen/src/elastickv/redis_zset_safety_workload.clj
Comment thread jepsen/src/elastickv/redis_zset_safety_workload.clj
Three Major-severity CodeRabbit findings on PR #550:

1. Duplicate-member detection (line 416): a ZSet read must return each
   member at most once. Previously, if ZRANGE returned the same member
   twice with an allowed score, the checker accepted it because sort
   and score-membership checks passed independently per entry.
   duplicate-members helper now flags :duplicate-members (and
   :duplicate-members-range for ZRANGEBYSCORE) before the per-entry
   loop.

2. Overlapping committed writes (line 318): two :ok mutations whose
   invoke/complete windows overlap have ambiguous serialization order.
   Pinning allowed-scores to a single last-wins linearization by
   :complete-idx was unsound. allowed-scores-for-member now unions all
   :zadd/:ok-:zincrby scores from committed mutations (over-approx
   that stays sound), and must-be-present? is relaxed when any pair
   of committed writes for the same member overlaps in time.

3. Pre-read :info mutations (line 328): a mutation recorded as :info
   whose completion precedes a later read's invoke may have taken
   effect server-side. Previously it was ignored by both model-before
   (:ok only) and the concurrent window (complete-idx >= read-inv-idx
   required). Now collected as pre-read-info, contributing to allowed
   scores and flipping unknown-score? for :zincrby with unknown
   resulting score.

3 new regression tests (duplicate-members-are-flagged, overlapping-
committed-zadds-allow-either-score, info-before-read-is-considered-
uncertain). Workload unit test count now 9, all PASS under Java 21.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 20, 2026

6d0b4c3 で CodeRabbit Major 3 件を fix:

  • 重複メンバー: duplicate-members ヘルパーで ZRANGE / ZRANGEBYSCORE 結果の重複を :duplicate-members / :duplicate-members-range として明示検知。sort / score-membership の個別チェックでは検知できなかったケースを閉じた。
  • 重複する committed writes の線形化: :complete-idx 順の単一直列化は 2 つの :ok mutation の window が重なる場合に unsafe。allowed-scores-for-member を「committed の全 :zadd / :ok :zincrby score の union」に広げ (sound な over-approx)、must-be-present? は committed 間に重なりがあるときリラックス。
  • read 前完了の :info: pre-read-info として抽出し、allowed-scores と unknown-score? の判定に含めた。以前は model-before (:ok のみ) にも concurrent 窓 (complete-idx >= read-inv-idx 必須) にも入らず無視されていた。

regression test 3 件追加 (duplicate-members-are-flagged / overlapping-committed-zadds-allow-either-score / info-before-read-is-considered-uncertain)。unit test 計 9 件 PASS。

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 20, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jepsen workload and model-based checker to verify Redis ZSet safety properties in elastickv. It includes a client for Redis operations, a generator for high-contention workloads, and a checker that validates score correctness, ordering, and atomicity. Feedback focuses on handling infinite scores in Redis to prevent parsing exceptions, improving the soundness of the linearizability model in allowed-scores-for-member to prevent stale reads, and optimizing the efficiency of mutation checks.

Comment on lines +71 to +80
(defn- parse-withscores
"Carmine returns a flat [member score member score ...] vector for
ZRANGE WITHSCORES. Convert to a sorted vector of [member (double score)]
preserving server-returned order (score ascending, then member)."
[flat]
(->> flat
(partition 2)
(mapv (fn [[m s]]
[(if (bytes? m) (String. ^bytes m) (str m))
(Double/parseDouble (str s))]))))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of Double/parseDouble directly on Redis score strings will fail if the score is infinite. Redis returns "inf", "+inf", or "-inf" for infinite values, but Java's Double/parseDouble expects "Infinity" or "-Infinity". This will cause a NumberFormatException during the test if infinite scores are encountered.

(defn- parse-double-safe
  "Parses a Redis score string into a Double, handling 'inf' representations."
  [s]
  (let [s-str (str/lower-case (str s))]
    (cond
      (str/includes? s-str "inf") (if (str/starts-with? s-str "-")
                                     Double/NEGATIVE_INFINITY
                                     Double/POSITIVE_INFINITY)
      :else (Double/parseDouble (str s)))))

(defn- parse-withscores
  "Carmine returns a flat [member score member score ...] vector for
  ZRANGE WITHSCORES. Convert to a sorted vector of [member (double score)]
  preserving server-returned order (score ascending, then member)."
  [flat]
  (->> flat
       (partition 2)
       (mapv (fn [[m s]]
               [(if (bytes? m) (String. ^bytes m) (str m))
                (parse-double-safe s)]))))

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. parse-double-safe (lines 71-87) normalizes inf / +inf / -inf (and JVM-style Infinity) before falling through to Double/parseDouble. parse-withscores calls it on every returned score. Regression test: parse-withscores-handles-inf-strings. Landed in ecb3983.

(let [[member delta] (:value op)
new-score (car/wcar cs (car/zincrby zset-key (double delta) member))]
(assoc op :type :ok
:value [member (Double/parseDouble (str new-score))]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

As noted in the parse-withscores feedback, Double/parseDouble should be replaced with a safe parser that handles Redis's "inf" strings.

                   :value [member (parse-double-safe new-score)]]))

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. The :zincrby branch of invoke! already calls parse-double-safe on the server-returned score (line 134). Covered by parse-withscores-handles-inf-strings plus the infinity branch in parse-double-safe. Landed in ecb3983.

Comment on lines +330 to +418
(defn- allowed-scores-for-member
"Compute the set of scores considered valid for `member` by a read
whose window is [read-inv-idx, read-cmp-idx], based on committed state
and any concurrent mutations.

Returns:
:scores - set of acceptable scores (committed + concurrent
:zadd / :ok :zincrby).
:unknown-score? - true iff any concurrent ZINCRBY's resulting score
is unknown (in-flight or :info). When set, the
caller MUST skip the strict score-membership
check to stay sound.
:must-be-present? - committed state says present and no concurrent
mutation could have removed/changed it.
:any-known? - some op claims to have touched this member."
[mutations-by-m member read-inv-idx read-cmp-idx]
(let [muts (get mutations-by-m member [])
;; :ok mutations that completed strictly before the read. They
;; may have overlapped with each other in wall-clock time, so
;; the serialization order among them is ambiguous.
committed (->> muts
(filter #(and (= :ok (:type %))
(some? (:complete-idx %))
(< (:complete-idx %) read-inv-idx))))
;; :info mutations that completed before the read: they may or
;; may not have taken effect server-side. We must account for
;; their possible scores just like concurrent ones.
pre-read-info (->> muts
(filter #(and (= :info (:type %))
(some? (:complete-idx %))
(< (:complete-idx %) read-inv-idx))))
;; Concurrent mutations: windows overlap the read. Include both
;; :ok and :info since either may have taken effect.
concurrent (concurrent-mutations-for-member muts read-inv-idx read-cmp-idx)
;; A conservative last-wins linearization for the must-be-present?
;; check only. Ambiguous when committed writes overlap each other.
committed-sorted (sort-by :complete-idx committed)
committed-state (reduce apply-mutation-to-state nil committed-sorted)
committed-overlap? (boolean
(some (fn [[a b]]
(and (not (identical? a b))
(<= (:invoke-idx a) (:complete-idx b))
(<= (:invoke-idx b) (:complete-idx a))))
(for [a committed, b committed] [a b])))
;; Union of every score that any committed / pre-read :info /
;; concurrent op could have produced. This over-approximates the
;; legitimate post-state set when writes overlap, keeping the
;; checker sound at the cost of being slightly less strict on
;; overlapping concurrent writers.
add-scores (fn [acc m]
(case (:f m)
:zadd (conj acc (:score m))
:zincrby (cond-> acc (some? (:score m)) (conj (:score m)))
:zrem acc))
scores (as-> #{} s
(reduce add-scores s committed)
(reduce add-scores s pre-read-info)
(reduce add-scores s concurrent))
unknown-score? (or
(some #(and (= :zincrby (:f %)) (:unknown-score? %))
concurrent)
(some #(and (= :zincrby (:f %)) (:unknown-score? %))
pre-read-info))
;; any-known? must only be true when something provides evidence
;; the member actually existed at some point. A no-op ZREM
;; (:removed? false) does NOT prove existence.
existence-evidence? (or (some #(case (:f %)
(:zadd :zincrby) true
:zrem (:removed? %))
committed)
(some #(case (:f %)
(:zadd :zincrby) true
:zrem (:removed? %))
pre-read-info)
(some #(case (:f %)
(:zadd :zincrby) true
:zrem (:removed? %))
concurrent))]
{:scores scores
:unknown-score? (boolean unknown-score?)
;; must-be-present? is relaxed when committed writes overlap
;; among themselves or when any :info / concurrent mutation could
;; have removed the member before the read.
:must-be-present? (boolean (and committed-state
(:present? committed-state)
(not committed-overlap?)
(empty? pre-read-info)
(empty? concurrent)))
:any-known? (boolean existence-evidence?)}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of allowed-scores-for-member is too permissive and potentially unsound for linearizability checks.

  1. Stale Reads: It includes all successful committed scores in the scores set (line 384), even if they were strictly followed by a later committed mutation. In a linearizable system, a read should only observe scores from the "potentially latest" committed writes (those not strictly followed by another) or concurrent ones.
  2. Stale Presence: The checker does not verify if a member is allowed to be present. If a member was added and then deleted (both committed before the read), any-known? remains true, and the read will accept the member even though it should be absent.
  3. Efficiency: The committed-overlap? check (line 368) is $O(N^2)$ relative to the number of mutations for a member. While the member universe is small, this can be optimized to $O(N)$ by tracking the maximum invoke-idx of preceding mutations.
(defn- allowed-scores-for-member
  "Compute the set of scores considered valid for `member` by a read
  whose window is [read-inv-idx, read-cmp-idx], based on committed state
  and any concurrent mutations."
  [mutations-by-m member read-inv-idx read-cmp-idx]
  (let [muts (get mutations-by-m member [])
        ;; Mutations that completed strictly before the read started.
        preceding (filter #(and (= :ok (:type %))
                                (some? (:complete-idx %))
                                (< (:complete-idx %) read-inv-idx))
                          muts)
        ;; A mutation is a candidate for the 'latest' state if no other
        ;; committed mutation strictly follows it (m2.invoke > m.complete).
        max-inv (reduce max -1 (map :invoke-idx preceding))
        candidates (filter #(>= (:complete-idx %) max-inv) preceding)

        concurrent (concurrent-mutations-for-member muts read-inv-idx read-cmp-idx)
        pre-read-info (filter #(and (= :info (:type %))
                                    (some? (:complete-idx %))
                                    (< (:complete-idx %) read-inv-idx))
                              muts)

        ;; Scores allowed by committed state or uncertain ops.
        scores (into (set (keep :score candidates))
                     (concat (keep :score concurrent)
                             (keep :score pre-read-info)))

        unknown-score? (some #(and (= :zincrby (:f %)) (:unknown-score? %))
                             (concat concurrent pre-read-info))

        ;; Allowed to be present if any candidate or concurrent/info op is a write.
        can-be-present? (or (some #(#{:zadd :zincrby} (:f %)) candidates)
                            (some #(#{:zadd :zincrby} (:f %)) concurrent)
                            (some #(#{:zadd :zincrby} (:f %)) pre-read-info))

        ;; Must be present if all candidates are writes and no concurrent/info op can delete.
        must-be-present? (and (seq candidates)
                              (every? #(#{:zadd :zincrby} (:f %)) candidates)
                              (not (some #(= :zrem (:f %)) concurrent))
                              (not (some #(= :zrem (:f %)) pre-read-info)))]
    {:scores scores
     :unknown-score? (boolean unknown-score?)
     :must-be-present? must-be-present?
     :can-be-present? can-be-present?}))

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. allowed-scores-for-member now applies a real-time last-wins max-inv filter to committed mutations (lines 408-411): a preceding mutation is admissible only when no other preceding mutation's invoke strictly follows it. Superseded scores are excluded from :scores. Presence-under-stale-reads is caught by can-be-present? (lines 466-469), which is checked in both check-zrange-all (line 529) and check-zrangebyscore (line 590). Regression tests: superseded-committed-score-is-not-allowed, stale-read-after-committed-zrem-is-flagged, phantom-member-is-flagged. Landed in 2a194a4.

Comment on lines +450 to +457
(let [{:keys [scores any-known? unknown-score?]}
(allowed-scores-for-member mutations-by-m member inv-idx cmp-idx)]
(cond
(not any-known?)
(swap! errors conj {:kind :phantom
:index cmp-idx
:member member
:score score})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The phantom check should be replaced with a more robust can-be-present? check to catch both phantoms (never existed) and stale reads (present after a committed deletion).

    (doseq [[member score] entries]
      (let [{:keys [scores can-be-present? unknown-score?]}
            (allowed-scores-for-member mutations-by-m member inv-idx cmp-idx)]
        (cond
          (not can-be-present?)
          (swap! errors conj {:kind :unexpected-presence
                              :index cmp-idx
                              :member member
                              :score score})

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. check-zrange-all (line 529) and check-zrangebyscore (line 590) both use (not can-be-present?) to emit :unexpected-presence / :unexpected-presence-range. can-be-present? (returned from allowed-scores-for-member) covers both phantoms (no existence evidence anywhere) and stale reads (committed-removed before the read). Regression tests: phantom-member-is-flagged, stale-read-after-committed-zrem-is-flagged. Landed in 2a194a4.

bootjp added 2 commits April 22, 2026 00:24
…o ops

Address correctness issues in the Redis ZSet safety workload raised in
PR #550 review:

- parse-double-safe: Redis emits "inf"/"+inf"/"-inf" for infinite ZSET
  scores, which Double/parseDouble rejects. Route parse-withscores and
  the ZINCRBY reply through a tolerant parser that maps those to
  +/- Double/POSITIVE_INFINITY.
- allowed-scores-for-member: tighten committed-score admissibility to
  real-time "candidates" (preceding :ok mutations not strictly followed
  in real time by another preceding :ok mutation). Superseded committed
  scores are no longer admissible, closing a stale-read soundness gap.
- can-be-present?: replace the phantom-only check with a presence
  check that rejects both phantoms (never existed) and stale reads
  (member committed-removed before the read with no concurrent re-add).
- CLI dispatch: -main now selects a workload by the first positional
  arg (elastickv-test, elastickv-zset-safety-test, elastickv-dynamodb-
  test, elastickv-s3-test) and auto-prepends the jepsen "test"
  subcommand when missing, matching the invocation documented in the
  PR description.
- Tests: cover phantom, stale-read-after-committed-zrem, superseded
  committed score, and parse-withscores infinity handling.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 22, 2026

Addressed all 9 review items in commit ecb39831e5af88c24a8f6830668e905e7eee5cc8.

gemini HIGH (4)

  1. parse-withscores infinity handling — Added parse-double-safe that normalizes Redis's "inf"/"+inf"/"-inf" (case-insensitive) to Double/POSITIVE_INFINITY/NEGATIVE_INFINITY, and falls through to Double/parseDouble otherwise. Replaced both call sites (parse-withscores and the ZINCRBY reply parser).
  2. allowed-scores-for-member stale-read permissiveness — Restricted the admissible committed-score set to real-time "candidates": preceding :ok mutations whose :complete-idx is >= the max :invoke-idx of other preceding mutations (i.e. not strictly superseded in real time). Scores from superseded committed mutations are no longer accepted.
  3. Phantom / stale-read check robustness — Replaced the phantom-only any-known? gate with can-be-present?, which is true iff some candidate / concurrent / pre-read :info op establishes legitimate presence. A member committed-removed before a read (no concurrent re-add) now trips :unexpected-presence instead of slipping through.

Copilot (5)

  1. :zrem unconditional deletioncompleted-mutation-window already carries :removed? from invoke! (extracts it from the [member removed?] reply for :ok); apply-mutation-to-state leaves state unchanged on a :removed? false ZREM. The refactored existence-evidence? helper now excludes no-op ZREMs, so they no longer count as proof the member existed.
  2. :zincrby indeterminate score:unknown-score? is already emitted for :info/:pending ZINCRBY. allowed-scores-for-member now surfaces it whenever any concurrent OR pre-read :info op has it set, and the read checker skips the strict score-membership check in that case (effectively "accept any score" for the overlapping window).
  3. model-before docstring — Verified current docstring already matches the implementation ({member -> {:present? bool :score s}}); the mismatch referenced in the review is no longer present at HEAD. No change required.
  4. -main CLI dispatch-main now reads the first positional arg to select a workload: elastickv-test (default, backward-compatible), elastickv-zset-safety-test, elastickv-dynamodb-test, elastickv-s3-test. When the next arg isn't test/analyze, we auto-prepend test so lein run -m elastickv.jepsen-test elastickv-zset-safety-test --nodes n1,... works as documented.
  5. Unit tests — Added: phantom-member-is-flagged, stale-read-after-committed-zrem-is-flagged, superseded-committed-score-is-not-allowed, parse-withscores-handles-inf-strings. Pre-existing tests cover no-op ZREM and :info ZINCRBY.

Validation

  • lein test elastickv.redis-zset-safety-workload-test elastickv.cli-test → 16 tests, 23 assertions, 0 failures.
  • lein check: the pre-existing jepsen/redis/client namespace error (empty redis/src submodule in this environment) is unrelated and reproduces on the base branch without my changes.

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jepsen workload, elastickv-zset-safety-test, designed to verify the safety properties of Redis ZSet implementations under faults, including score correctness, ordering, and atomicity. It adds a custom model-based checker and corresponding unit tests. Feedback focuses on refining the allowed-scores-for-member logic to better handle concurrent operations and uncertain states, specifically addressing potential false positives in presence detection and improving the completeness checks for ZRANGE and ZRANGEBYSCORE operations.

Comment on lines +393 to +462
(let [muts (get mutations-by-m member [])
;; :ok mutations that completed strictly before the read.
preceding (->> muts
(filter #(and (= :ok (:type %))
(some? (:complete-idx %))
(< (:complete-idx %) read-inv-idx))))
;; Real-time "last-wins" candidate filter: a preceding mutation
;; m is admissible iff no OTHER preceding mutation m' has
;; m'.invoke-idx > m.complete-idx (i.e. m' strictly follows m).
;; Equivalent: m.complete-idx >= max(invoke-idx) over preceding.
max-inv (reduce max -1 (map :invoke-idx preceding))
candidates (filterv #(>= (:complete-idx %) max-inv) preceding)
;; :info mutations that completed before the read: they may or
;; may not have taken effect server-side.
pre-read-info (->> muts
(filter #(and (= :info (:type %))
(some? (:complete-idx %))
(< (:complete-idx %) read-inv-idx))))
;; Concurrent mutations: windows overlap the read. Include both
;; :ok and :info since either may have taken effect.
concurrent (concurrent-mutations-for-member muts read-inv-idx read-cmp-idx)

add-scores (fn [acc m]
(case (:f m)
:zadd (conj acc (:score m))
:zincrby (cond-> acc (some? (:score m)) (conj (:score m)))
:zrem acc))
;; Admissible scores: candidate committed + pre-read :info +
;; concurrent writes (with a known score).
scores (as-> #{} s
(reduce add-scores s candidates)
(reduce add-scores s pre-read-info)
(reduce add-scores s concurrent))

has-unknown-incr? (fn [coll]
(some #(and (= :zincrby (:f %))
(:unknown-score? %))
coll))
unknown-score? (or (has-unknown-incr? concurrent)
(has-unknown-incr? pre-read-info))

;; Did any candidate commit establish presence (write, or
;; ZREM with :removed? -- either way the member existed)?
candidate-state (reduce apply-mutation-to-state nil
(sort-by :complete-idx candidates))
candidate-present? (boolean (:present? candidate-state))

any-concurrent-could-write? (or (some write-op? concurrent)
(some write-op? pre-read-info))
any-concurrent-could-remove? (or (some #(= :zrem (:f %)) concurrent)
(some #(= :zrem (:f %)) pre-read-info))

can-be-present? (or candidate-present?
any-concurrent-could-write?
;; A :zrem with :removed? true still proves
;; existence; if a concurrent ZREM raced
;; with an earlier write whose window is
;; not captured as a candidate, presence is
;; uncertain rather than forbidden.
(and (some existence-evidence? (concat concurrent
pre-read-info))
any-concurrent-could-remove?))

must-be-present? (boolean (and candidate-present?
(empty? pre-read-info)
(empty? concurrent)))]
{:scores scores
:unknown-score? (boolean unknown-score?)
:can-be-present? (boolean can-be-present?)
:must-be-present? must-be-present?}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic in allowed-scores-for-member has several correctness issues that will lead to false positives and false negatives in safety checking:

  1. False Positives (Strictness): candidate-present? relies on complete-idx order to determine presence. In a linearizable system, concurrent committed operations can be serialized in any order. If a ZADD and ZREM are concurrent but both finish before the read, the read may legitimately observe either state. The current logic flags one of these as an error.
  2. False Negatives (Completeness): must-be-present? is too strict. It returns false if any uncertain operation exists, even if that operation is a ZADD which cannot remove the member. It should only be false if an uncertain or concurrent removal exists.
  3. ZINCRBY Score Set: For multiple concurrent ZINCRBY operations, the set of allowed scores should include all possible prefix sums of the increments. The current logic only includes the final score of each increment, which is too restrictive.

The suggested fix relaxes can-be-present? to allow any valid serialization of candidates and uncertain operations, and refines must-be-present? to correctly identify when a member is required to appear.

  (let [muts (get mutations-by-m member [])
        ;; :ok mutations that completed strictly before the read.
        preceding (->> muts
                       (filter #(and (= :ok (:type %))
                                     (some? (:complete-idx %))
                                     (< (:complete-idx %) read-inv-idx))))
        ;; Real-time "last-wins" candidate filter: a preceding mutation
        ;; m is admissible iff no OTHER preceding mutation m' has
        ;; m'.invoke-idx > m.complete-idx (i.e. m' strictly follows m).
        max-inv (reduce max -1 (map :invoke-idx preceding))
        candidates (filterv #(>= (:complete-idx %) max-inv) preceding)
        ;; :info mutations that completed before the read: they may or
        ;; may not have taken effect server-side.
        pre-read-info (->> muts
                           (filter #(and (= :info (:type %))
                                         (some? (:complete-idx %))
                                         (< (:complete-idx %) read-inv-idx))))
        ;; Concurrent mutations: windows overlap the read. Include both
        ;; :ok and :info since either may have taken effect.
        concurrent (concurrent-mutations-for-member muts read-inv-idx read-cmp-idx)
        all-uncertain (concat pre-read-info concurrent)

        add-scores (fn [acc m]
                     (case (:f m)
                       :zadd    (conj acc (:score m))
                       :zincrby (cond-> acc (some? (:score m)) (conj (:score m)))
                       :zrem    acc))
        ;; Admissible scores: candidate committed + pre-read :info +
        ;; concurrent writes (with a known score).
        scores (as-> #{} s
                 (reduce add-scores s candidates)
                 (reduce add-scores s all-uncertain))

        ;; Relax score check if any ZINCRBY is concurrent or uncertain.
        ;; Multiple concurrent increments can produce intermediate sums not in `scores`.
        unknown-score? (some #(= :zincrby (:f %)) all-uncertain)

        ;; can-be-present?: true if any candidate or uncertain op results in presence.
        can-be-present? (or (some write-op? candidates)
                            (some write-op? all-uncertain)
                            (some existence-evidence? (concat candidates all-uncertain)))

        ;; must-be-present?: true if ALL valid serializations result in presence.
        must-be-present? (and (some write-op? candidates)
                              (not (some #(= :zrem (:f %)) candidates))
                              (not (some #(= :zrem (:f %)) all-uncertain)))]
    {:scores           scores
     :unknown-score?   (boolean unknown-score?)
     :can-be-present?  (boolean can-be-present?)
     :must-be-present? must-be-present?})

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially stale, partially rebutted.

False Positives (Strictness) — addressed in 2a194a4. Overlapping-window committed ops are both admissible: max-inv = max of preceding :invoke-idx; candidates = preceding ops with :complete-idx >= max-inv. When multiple candidates remain their windows overlap, so any can serialize last, and candidate-can-be-present? / candidate-can-be-absent? both consider them. Regression tests: concurrent-zadd-zrem-both-completed-accepts-either-outcome (both [["m1" 1.0]] and [] accepted), overlapping-committed-zadds-allow-either-score.

False Negatives (Completeness on must-be-present?) — I don't see this in the code. must-be-present? (lines 475-477) guards on any-candidate-write?, (not candidate-can-be-absent?), and (not any-uncertain-zrem?). Uncertain ZADDs do NOT force it to false; only uncertain ZREMs do, which is correct (a ZREM could reorder last and remove the member). Could you point to the exact path that triggers the false negative?

Comment on lines +516 to +526
(let [model (model-before mutations-by-m inv-idx)
observed-members (into #{} (map first) entries)]
(doseq [[member {:keys [present?]}] model]
(when (and present? (not (contains? observed-members member)))
(let [muts (get mutations-by-m member [])
concurrent (concurrent-mutations-for-member muts inv-idx cmp-idx)]
(when (empty? concurrent)
(swap! errors conj {:kind :missing-member
:index cmp-idx
:member member}))))))
@errors))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The completeness check for ZRANGE is currently too simple and doesn't account for :info operations that might have removed a member before the read. It should use the must-be-present? logic from allowed-scores-for-member which correctly handles uncertain removals.

    (let [model (model-before mutations-by-m inv-idx)
          observed-members (into #{} (map first) entries)]
      (doseq [[member _] model]
        (let [{:keys [must-be-present?]} (allowed-scores-for-member mutations-by-m member inv-idx cmp-idx)]
          (when (and must-be-present? (not (contains? observed-members member)))
            (swap! errors conj {:kind :missing-member
                                :index cmp-idx
                                :member member}))))
      @errors))

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. check-zrange-all completeness (lines 549-558) already uses must-be-present? from allowed-scores-for-member, which correctly skips members that an :info or concurrent ZREM might have removed. Regression test: info-zrem-concurrent-with-read-allows-missing-member. Landed in 2a194a4.

Comment on lines +569 to +585
;; Completeness within bounds: any model member whose committed score
;; is in [lo,hi] with no concurrent mutation must appear.
(let [model (model-before mutations-by-m inv-idx)
observed-members (into #{} (map first) members)]
(doseq [[member {:keys [present? score]}] model]
(when (and present?
(<= lo score hi)
(not (contains? observed-members member)))
(let [muts (get mutations-by-m member [])
concurrent (concurrent-mutations-for-member muts inv-idx cmp-idx)]
(when (empty? concurrent)
(swap! errors conj {:kind :missing-member-range
:index cmp-idx
:bounds bounds
:member member
:expected-score score}))))))
@errors))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The completeness check for ZRANGEBYSCORE has the same issue as ZRANGE: it doesn't account for uncertain removals. Additionally, it should only enforce completeness if the score is definitively known to be within the requested range (i.e., no uncertain increments).

    ;; Completeness within bounds: any model member whose committed score
    ;; is in [lo,hi] with no concurrent mutation must appear.
    (let [model (model-before mutations-by-m inv-idx)
          observed-members (into #{} (map first) members)]
      (doseq [[member _] model]
        (let [{:keys [must-be-present? scores unknown-score?]} (allowed-scores-for-member mutations-by-m member inv-idx cmp-idx)]
          (when (and must-be-present?
                     (not unknown-score?)
                     (every? #(<= lo % hi) scores)
                     (not (contains? observed-members member)))
            (swap! errors conj {:kind :missing-member-range
                                :index cmp-idx
                                :bounds bounds
                                :member member
                                :expected-score (first scores)}))))
      @errors))

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. check-zrangebyscore completeness (lines 610-623) uses both must-be-present? AND score-definitely-in-range? (which requires (not unknown-score?) and every candidate score inside [lo, hi]). Uncertain ZINCRBYs → unknown-score? → no completeness failure. Regression test: info-zincrby-does-not-flag-zrangebyscore-completeness. Landed in 2a194a4.

…ations in checker

- allowed-scores-for-member: replace complete-idx-ordered candidate-present?
  with linearization-aware semantics. When multiple candidates share
  overlapping windows, presence is allowed iff some linearization keeps
  the member present; presence is required only when every linearization
  does. Uncertain ZREMs (pre-read :info / concurrent) now correctly
  relax must-be-present?.
- check-zrange-all / check-zrangebyscore: delegate completeness to
  must-be-present? so :info or concurrent ZREMs don't false-positive
  missing-member.
- check-zrangebyscore: gate completeness on a new
  score-definitely-in-range? helper so uncertain ZINCRBY (unknown
  resulting score) can't trigger false :missing-member-range.
- Tests: concurrent ZADD+ZREM accepts either outcome; :info ZREM
  allows absent read; :info ZINCRBY with out-of-range guess does
  not flag completeness; sanity tests still flag truly-missing members.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 22, 2026

Addressed 3 new gemini HIGH items (commit 2a194a4):

1. allowed-scores-for-member linearization (line 462) — Replaced complete-idx-ordered candidate-present? with linearization-aware semantics. When multiple :ok candidates share overlapping windows they may serialize in any real-time-consistent order, so:

  • can-be-present? = some candidate is a write, or some uncertain op can flip state to present.
  • must-be-present? = some candidate is a write AND no candidate is a ZREM AND no uncertain ZREM exists.

This correctly accepts reads observing EITHER outcome when a ZADD and ZREM both commit concurrently, and correctly flags superseded-score / stale reads / phantom members.

2. check-zrange-all completeness (line 526) — Replaced model-before + concurrent-check with a call to must-be-present?. A member required-present only when all admissible linearizations leave it present, so :info / concurrent ZREMs no longer false-positive.

3. check-zrangebyscore completeness (line 585) — Same must-be-present? delegation plus a new score-definitely-in-range? helper. Completeness fires only when unknown-score? is false and every admissible score is inside [lo, hi], so an uncertain (:info/concurrent) ZINCRBY whose resulting score could fall outside the range no longer causes a false :missing-member-range.

New test coverage (redis_zset_safety_workload_test.clj, additive — earlier ecb39831 tests still pass):

  • concurrent-zadd-zrem-both-completed-accepts-either-outcome (present + absent)
  • info-zrem-concurrent-with-read-allows-missing-member
  • info-zincrby-does-not-flag-zrangebyscore-completeness
  • zrange-completeness-still-detects-truly-missing-member (sanity)
  • zrangebyscore-completeness-still-detects-truly-missing-member (sanity)

lein test elastickv.redis-zset-safety-workload-test elastickv.cli-test → 21 tests, 31 assertions, 0 failures / 0 errors.

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 22, 2026

@codex code review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 49c5e0cea1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +444 to +445
unknown-score? (or (has-unknown-incr? uncertain)
(some #(= :zincrby (:f %)) uncertain))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep score validation enabled for known concurrent ZINCRBY

allowed-scores-for-member marks :unknown-score? true whenever any uncertain mutation is :zincrby, even when that op completed :ok with a known resulting score. Both check-zrange-all and check-zrangebyscore then skip score-membership validation when :unknown-score? is true, so a read concurrent with a known ZINCRBY can return an impossible score (e.g., 999.0) without being flagged. This creates false negatives in the new safety checker and weakens the core score-correctness guarantee under concurrency.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0c0efc4. unknown-score? is now set only when an individual ZINCRBY's own score is unknown (:info/:pending — the :unknown-score? flag set in completed-mutation-window), or when there are >=2 concurrent/uncertain ZINCRBYs where prefix-sum intermediates can occur. A single :ok concurrent ZINCRBY no longer waves through impossible scores. Regression test: single-ok-concurrent-zincrby-still-validates-scores (read returning 999.0 alongside one :ok ZINCRBY is flagged :score-mismatch).

… known

Codex P1: `allowed-scores-for-member` was marking `:unknown-score?`
true whenever ANY uncertain mutation was `:zincrby` — including
completed `:ok` ops whose resulting score is fully recovered from the
server reply. That over-relaxed the check in `check-zrange-all` /
`check-zrangebyscore`, letting a read concurrent with a single known
ZINCRBY return an arbitrary impossible score without being flagged.

Refine the relaxation rule so it only fires when the score is
genuinely unrecoverable:

  * `:unknown-score?` is set on an individual ZINCRBY only when its
    own result is unknown (:info/:pending), OR
  * there are >=2 concurrent/uncertain ZINCRBYs whose relative
    serialization produces prefix-sum intermediates not present in
    `:scores`.

With a single :ok concurrent ZINCRBY the read can observe either the
pre-op score (in `:scores` via candidate committed ops) or the post-op
score (also in `:scores`), so the strict score-membership check stays
sound.

Add two regression tests:
  * `single-ok-concurrent-zincrby-still-validates-scores` flags a
    read observing 999.0 while a single :ok ZINCRBY is concurrent.
  * `two-concurrent-zincrbys-relax-score-check` accepts the
    intermediate prefix-sum score under multi-ZINCRBY overlap.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Addressed the three Gemini review items:

Item 1 (HIGH) — setup! must hard-fail on missing :conn-spec

Commit 0c948a2b changes setup! from "loud warn and proceed" to throwing an ex-info when open! failed to populate :conn-spec. Continuing with a no-op setup would leave stale ZSet state from a prior run under jepsen-zset-safety and yield false-positive checker verdicts; Jepsen now aborts the run instead. Updated the matching unit test to assert the new throwing behavior.

Item 2 (MEDIUM) — CLI arg prepending robustness

Commit 84989f18 flips the heuristic in jepsen_test.clj. Previously we hard-coded "test" / "analyze" as the only subcommands and prepended "test" for anything else (silently breaking future jepsen.cli subcommands like serve). Now we prepend "test" only when remaining-args is empty OR the first token starts with - (an option). Any non-option first token is treated as a subcommand and passed through to jepsen.cli/run! unchanged.

Item 3 (MEDIUM) — :final-generator nil override

Commit 7a7a2185 keeps the override but adds the one-line comment. This is deliberate and matches the project-wide pattern in redis_workload.clj / s3_workload.clj / dynamodb_workload.clj, all of which set :final-generator nil with ; Jepsen 0.3.x can't fressian-serialize some combined final gens; skip. (introduced in commit a27267ca0e). The override is a workaround for a Jepsen 0.3.x Fressian-serialization limitation, not a bug — restoring (gen/once {:f :zrange-all}) would make the test fail to serialize. Comment added so future readers (and Gemini) see the reasoning inline.

Tests

lein test elastickv.redis-zset-safety-workload-test → 35 tests, 56 assertions, 0 failures, 0 errors.

/gemini review
@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a7a218541

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +751 to +752
{:valid? (empty? all-errors)
:reads (count read-pairs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject vacuous pass when no successful reads exist

The checker currently reports success whenever all-errors is empty, but all-errors is built only from :ok read completions. If Redis is unreachable or misconfigured (e.g., bad host/port), invoke! downgrades ops to :info, read-pairs becomes empty, and this returns :valid? true even though no safety property was actually exercised. That can produce false-green CI runs for a broken test environment; the checker should mark this case invalid/unknown (or enforce a minimum successful-read count).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a Redis ZSet safety workload to the Jepsen test suite and updates the main entry point for workload dispatching. The new workload features a custom model-based checker and unit tests. Feedback focuses on improving robustness by throwing exceptions if setup! fails, handling potential nil responses in zrem to avoid NullPointerException, and implementing a simpler final generator to ensure quiescent consistency checks.

Comment on lines +174 to +176
;; operators notice. clojure.tools.logging/warn expects
;; (warn msg) or (warn throwable msg) -- NOT multiple strings.
(warn t "ZSet safety setup! DEL failed -- stale data may survive into this run"))))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While logging a warning is helpful, failing to clear the zset-key during setup! can lead to false-positive safety results if stale data from a previous run survives. Given that this is a safety-focused workload, it is generally safer to throw an exception and abort the run if the initial state cannot be guaranteed clean, rather than proceeding with a warning.

:zrem
(let [member (:value op)
removed (car/wcar cs (car/zrem zset-key member))]
(assoc op :type :ok :value [member (pos? (long removed))]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If car/wcar returns nil (which can happen in some Carmine configurations during protocol edge cases or timeouts if not configured to throw), the call to (long removed) will throw a NullPointerException. Although this is caught by the general Exception handler at line 225 and converted to an :info op, it's better to handle the potential nil explicitly to avoid unnecessary stack traces in logs.

            (assoc op :type :ok :value [member (pos? (long (or removed 0)))])

:remote control/ssh
:nemesis (if nemesis-p (:nemesis nemesis-p) nemesis/noop)
;; Jepsen 0.3.x can't fressian-serialize some combined final gens; skip.
:final-generator nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Setting :final-generator to nil disables the quiescent check at the end of the Jepsen run. While the comment notes a serialization issue in Jepsen 0.3.x, this means the workload loses the ability to verify that the system eventually reaches a consistent state after faults stop. If the serialization issue is specific to complex generators, consider using a simpler, non-combined generator for the final check instead of disabling it entirely.

bootjp added 3 commits April 23, 2026 19:47
If `car/wcar` for ZREM returns nil (protocol edge, closed connection,
Redis error path), `(long nil)` throws NPE. The NPE was previously
caught by the general Exception handler and the op was logged as a
generic failure, masking the real signal.

Extract a `zrem!` helper paralleling `zincrby!` and wrap the reply in
`(or removed 0)` so a nil reply resolves cleanly as `:ok [member false]`.
Covers tests for both the nil-guard and the normal numeric reply.

Addresses gemini MEDIUM review on PR #550.
If every operation in a run is downgraded to :info (Redis unreachable,
every read timed out, etc.), read-pairs is empty and all-errors is
empty, so the checker previously returned :valid? true -- a
false-green that hides the fact that no safety evidence was gathered.

Emit :valid? :unknown plus a diagnostic :reason string when the
history contains zero successful :zrange-all / :zrangebyscore reads.
The cli's fail-on-invalid! treats anything other than true as a
failure, so the run now surfaces the missing signal.

Regression tests cover the empty history, the all-:info history, and
the single-:ok-read positive case.

Addresses codex P1 review on PR #550.
Previously setup! caught Throwable around the cleanup
(car/wcar cs (car/del zset-key)) and only logged a warn, then returned
the client so the run continued. If the DEL failed (connection
refused, Redis error, timeout, etc.) stale data from a previous run
would survive under zset-key and could produce false-positive safety
verdicts in the checker.

Re-throw an ex-info wrapping the original cause so Jepsen aborts the
run. The warn is retained for log visibility. Update the existing
"tolerates-unreachable-redis" regression test to the new expectation:
setup! MUST propagate cleanup failures, not swallow them.

Addresses gemini MEDIUM review on PR #550.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Round 2 review fixes

Addressing the 4 new review items. 3 applied, 1 rebutted.

Applied

Codex P1 — Reject vacuous pass when no successful reads exist (e67d29f1)
Previously :valid? (empty? all-errors) returned true when read-pairs was empty (all ops downgraded to :info under an unreachable Redis). Now the checker returns :valid? :unknown with a diagnostic :reason when zero successful :zrange-all / :zrangebyscore reads were collected. elastickv.cli/fail-on-invalid! (cli.clj:72) already treats anything other than true as a failure, so the run surfaces the missing signal. Added regression tests: empty-history-is-unknown-not-valid, all-info-history-is-unknown-not-valid, one-successful-read-is-enough-to-validate.

Gemini Medium 2 — ZREM nil guard (62198314)
Confirmed: the :zrem branch in invoke! had (pos? (long removed)) and Carmine can return nil under protocol edges. Extracted a zrem! helper (paralleling zincrby!) and wrapped the reply in (or removed 0) so a nil reply resolves cleanly as :ok [member false]. Added zrem-invoke-handles-nil-response and zrem-invoke-handles-numeric-response tests. No other branch has the same pattern: :zadd ignores the reply, :zincrby already classifies nil via coerce-zincrby-score, and :zrange-all / :zrangebyscore route through parse-withscores which handles empty/nil flat inputs.

Gemini Medium 1 — setup! hard-fails on cleanup failure (623d5c22)
Confirmed: the prior setup! caught Throwable on (car/wcar cs (car/del zset-key)) and only logged a warn, then continued. Even the in-file comment agreed this is dangerous ("Do NOT swallow silently: repeated setup! failures across runs would leave stale data under zset-key and could produce false-positive safety failures"). Changed to re-throw an ex-info wrapping the original cause; updated setup-bang-tolerates-unreachable-redissetup-bang-hard-fails-when-cleanup-del-errors to encode the new contract.

Rebutted

Gemini Medium 3 — :final-generator nil override
Not applying. The nil override is a codebase-wide convention for this repo's Jepsen workloads, not a bug in the ZSet workload alone:

  • jepsen/src/elastickv/redis_workload.clj:111:final-generator nil
  • jepsen/src/elastickv/dynamodb_workload.clj:365:final-generator nil
  • jepsen/src/elastickv/s3_workload.clj:213:final-generator nil
  • jepsen/src/elastickv/redis_zset_safety_workload.clj:820:final-generator nil (with explanatory comment added in 7a7a218)

Commit 7a7a218 documents the root cause: "Jepsen 0.3.x can't fressian-serialize some combined final gens; skip." Adopting (gen/once {:f :zrange-all}) for just this workload would diverge from the other three workloads, and the Fressian serialization path is invoked by Jepsen's test store regardless of generator complexity — the same issue would recur. The in-workload map at line 772 still defines (gen/once {:f :zrange-all}) as the canonical workload default; the override at 820 is a test-harness-level workaround. If upstream Jepsen fixes Fressian serialization for combined generators we can revisit for all 4 workloads uniformly.

Test result

Ran 40 tests containing 66 assertions.
0 failures, 0 errors.

/gemini review

@codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jepsen workload for verifying Redis ZSet safety, including a custom model-based checker and unit tests. Feedback suggests refining the client configuration by reducing timeouts and improving exception reporting, as well as ensuring the final validation step is not skipped. Additionally, a cleanup of LLM-generated artifacts in the code comments is requested.

;; Carmine surfaces Redis error replies as exceptions by default,
;; but some codepaths wrap them in an ex-info / Throwable value.
(instance? Throwable response)
[:error (.getMessage ^Throwable response)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using .getMessage on a Throwable can return nil if the exception was constructed without a message. It is safer to provide a fallback (e.g., using str) to ensure the error field in the Jepsen history is always populated with a descriptive string for debugging.

Suggested change
[:error (.getMessage ^Throwable response)]
[:error (or (.getMessage ^Throwable response) (str response))]

host (or (:redis-host test) (name node))]
(assoc this :conn-spec {:pool {} :spec {:host host
:port port
:timeout-ms 10000}})))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A 10-second timeout is quite high for a Jepsen test. During network partitions or node failures, this can cause worker threads to block for a long duration, significantly reducing the operation rate and the effectiveness of the safety check. Consider reducing this to a lower value, such as 2000ms.

Suggested change
:timeout-ms 10000}})))
:timeout-ms 2000}})))

;; run survives under zset-key and can produce false-positive
;; safety verdicts in the checker. Log loudly AND re-throw so
;; Jepsen aborts the run instead of silently running against
;; dirty state. (gemini MEDIUM)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment contains leftover artifacts from an automated tool or LLM ((gemini MEDIUM)). These should be removed to maintain code cleanliness. Similar artifacts appear throughout this file and the associated test file (e.g., lines 590, 772).

Suggested change
;; dirty state. (gemini MEDIUM)
;; dirty state.

:remote control/ssh
:nemesis (if nemesis-p (:nemesis nemesis-p) nemesis/noop)
;; Jepsen 0.3.x can't fressian-serialize some combined final gens; skip.
:final-generator nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Setting :final-generator to nil skips the final validation step that ensures the system reached a consistent state after faults stopped. If the serialization issue mentioned is specific to complex generators, a simple (gen/once {:f :zrange-all}) (as defined in the workload map) should still be serializable and provide valuable safety coverage.

Suggested change
:final-generator nil
:final-generator (:final-generator workload)

(is (not (:valid? result)) (str "expected mismatch, got: " result))))

(deftest single-ok-concurrent-zincrby-still-validates-scores
;; Codex P1: :unknown-score? must NOT be set when exactly one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment contains LLM-generated artifacts (Codex P1). Please perform a global cleanup of such markers in the test file to ensure a professional codebase.

Suggested change
;; Codex P1: :unknown-score? must NOT be set when exactly one
;; :unknown-score? must NOT be set when exactly one

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 623d5c2215

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +129 to +130
(->> flat
(partition 2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject malformed WITHSCORES payload length

parse-withscores currently uses partition 2, which silently drops a trailing unpaired element. If the server returns a malformed WITHSCORES reply under fault (e.g., a member without its score due response corruption), the checker will truncate that evidence and may report a false green/unknown instead of surfacing the protocol violation. This workload is meant to catch safety bugs, so odd-length payloads should fail explicitly rather than being ignored.

Useful? React with 👍 / 👎.

bootjp added 3 commits April 23, 2026 20:09
Throwable.getMessage() can return nil when the exception was constructed
without a detail message. Without a guard, the op would record
:error nil -- an unhelpful signal that hides the real failure mode.
Wrap every .getMessage call feeding :error (or an error string) with
(or (.getMessage t) (str t)) so the diagnostic falls back to the
exception's class + inner state when no message is attached.

Applies to three sites in redis_zset_safety_workload.clj:
- coerce-zincrby-score's Throwable branch
- setup! cleanup-failed ex-info message
- invoke!'s catch-all :info error recorder
Scrub "(gemini MEDIUM)", "(codex P1)", "gemini HIGH", "CodeRabbit
finding", etc. from source and test comments. The substantive
explanations remain; only the review-trail jargon is removed so the
code reads as intentional design rather than a paper trail of
iterative bot feedback.

Files:
- jepsen/src/elastickv/redis_zset_safety_workload.clj (2 sites)
- jepsen/test/elastickv/redis_zset_safety_workload_test.clj (many)
elastickv-zset-safety-test was unconditionally overriding the workload
map's :final-generator to nil, citing "Jepsen 0.3.x can't fressian-
serialize some combined final gens." That blanket cargo-culted the
convention introduced in a27267c for OTHER workloads (redis,
s3, dynamodb) that compose more complex final gens. This workload's
inner :final-generator is literally (gen/once {:f :zrange-all}) -- a
single Limit defrecord wrapping a 1-key Clojure map, which round-trips
through Jepsen 0.3.x's Fressian store cleanly (verified at 86 bytes
via a REPL serialization test).

Drop the override so the workload's one-shot final read survives, and
update the comment to explain why this workload doesn't need the
convention workaround the other workloads do.

Note: elastickv.cli / jepsen_test.clj don't currently invoke
:final-generator at the runner level, so this is cosmetic today but
future-proofs the workload against a runner that would honor it.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Addressing the 5 Gemini medium items. Three commits on top of 623d5c22:

Commits

  • 22e41e11Item 1: .getMessage nil guard. Wraps the three .getMessage sites that feed :error fields with (or (.getMessage t) (str t)) so the diagnostic never ends up literally nil when a Throwable was constructed without a detail message. Sites: coerce-zincrby-score's Throwable branch (L119), setup!'s cleanup-failed ex-info message (L187), and invoke!'s catch-all :info error recorder (L248).

  • 9f5e958dItems 3 + 5: strip LLM reviewer artifact markers ((gemini MEDIUM), (codex P1), gemini HIGH, CodeRabbit finding, gemini HIGH batch 2, gemini HIGH (round 2), etc.) from all comments in the source and test file. Substantive explanations remain; only the paper-trail jargon is removed. Grepped the full tree afterward to confirm zero matches for (?i)(gemini|codex|coderabbit)\s*(MEDIUM|HIGH|LOW|P\d|finding|round|batch).

  • d0c8a03fItem 4: drop the :final-generator nil override so the inner workload's (gen/once {:f :zrange-all}) passes through. Empirically verified in a lein repl that this simple generator round-trips through Jepsen 0.3.x's Fressian writer (jepsen.store.fressian/write-handlers) in 86 bytes — it's a single jepsen.generator.Limit defrecord wrapping a 1-key Clojure map, both of which are covered by fress/clojure-write-handlers. The previous cargo-culted convention from commit a27267ca0e (which set :final-generator nil in redis_workload.clj) targeted composed final gens, not simple one-shots — nothing in that history actually demonstrated a Fressian failure for the gen/once {:keyword} shape.

Rebuttal — Item 2 (timeout)

Not applied. Gemini is wrong on this one.

Code evidence: the 10000ms timeout is a repo-wide Jepsen convention, not a workload-specific outlier. Every elastickv Jepsen client that speaks to a network service uses a 10-second socket-level timeout:

  • jepsen/src/elastickv/redis_workload.clj:26(rc/open h {:port p :timeout-ms 10000})
  • jepsen/src/elastickv/s3_workload.clj:53,67,78:socket-timeout 10000 (per request)
  • jepsen/src/elastickv/redis_zset_safety_workload.clj:157:timeout-ms 10000 (this PR)

Dropping just this one workload to 2000ms would create an inconsistent convention, AND the nemesis scenarios (partition / kill with fault-interval 40) can have single Redis operations legitimately blocked for more than 2 seconds during leader re-election — a sub-window timeout would bias the history heavily toward :info ops, reducing the checker's signal (see the vacuous-pass guard in zset-safety-checker which now emits :valid? :unknown when every read is downgraded). 10s is the right value for this faultset.

Test run

JAVA_HOME=/opt/homebrew/opt/openjdk@21 lein test elastickv.redis-zset-safety-workload-test
Ran 40 tests containing 66 assertions.
0 failures, 0 errors.

/gemini review

@codex review

Addresses codex P2 on 623d5c2.

parse-withscores used `(partition 2)` which silently drops a trailing
unpaired element. For a safety-focused workload this is exactly the
wrong failure mode: a WITHSCORES reply with a dangling member is a
protocol violation, and if the checker truncates the evidence rather
than surfacing it the anomaly is laundered into a false green/unknown
verdict.

Fix: validate `(odd? (count flat))` up front and throw `ex-info` with
the payload attached for debugging. The try/catch in invoke! routes
the throw into an `:info` op so the history records the violation
rather than dropping it.

Regression test `parse-withscores-rejects-odd-length-payload`
confirms a 3-element input throws with `"odd element count"`.

Tests: 41 tests / 67 assertions, 0 failures.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Addressed codex P2 on 623d5c2 in 559e83d.

parse-withscores silently dropped odd-length payloads. (partition 2) on a flat vector with a dangling trailing element discards the unpaired item without error. In a WITHSCORES response from Redis that's a protocol violation — exactly the kind of anomaly this safety workload is supposed to surface. Dropping it lets a corrupted reply be laundered into a false green (or :valid? :unknown after the round-2 vacuous-pass guard).

Fix: validate (odd? (count flat)) up front and throw ex-info with the payload attached for debugging:

(when (odd? (count flat))
  (throw (ex-info "WITHSCORES reply has odd element count"
                  {:count (count flat)
                   :payload flat})))

The general (catch Exception ...) in invoke! routes the throw to an :info op, so the history records the violation rather than dropping the evidence.

Regression test parse-withscores-rejects-odd-length-payload asserts a 3-element input throws with "odd element count".

Tests: 41 tests / 67 assertions, 0 failures (JDK 21, lein test elastickv.redis-zset-safety-workload-test).

/gemini review
@codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jepsen workload, elastickv-zset-safety-workload, to verify Redis ZSet safety properties such as score correctness, ordering, and atomicity under faults. It includes a custom model-based checker capable of handling concurrent mutations and uncertain operation outcomes, along with a suite of unit tests. The Jepsen test runner is also updated to dispatch to this new workload. Feedback focuses on improving the robustness of ZREM response parsing to prevent potential ClassCastException and enhancing the diagnostic clarity of error reports in range query checks by including the full set of allowed scores.

:value (pr-str v)}))))

:zrem
(let [member (:value op)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the current implementation guards against nil responses from Carmine, it assumes that any non-nil response from zrem! is a numeric value that can be safely cast to a long. If Carmine returns a string representation of the integer (which can happen in some protocol edge cases or Carmine versions), (long removed) will throw a ClassCastException. Consider using a more robust parsing approach similar to coerce-zincrby-score or explicitly parsing the string if removed is not a number.

(let [model (model-before mutations-by-m inv-idx)
observed-members (into #{} (map first) members)]
(doseq [[member _] model]
(let [{:keys [must-be-present? scores unknown-score?]}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the check-zrangebyscore error report, :expected-score is populated using (first scores). Since scores is a set of all admissible scores under concurrent mutations, picking an arbitrary one via first might be misleading if multiple scores are valid. While this is only for diagnostic purposes, it might be clearer to include the entire :allowed set in the error map, consistent with the :score-mismatch-range error above.

References
  1. Centralize common complex logic and maintain consistency in error reporting to improve maintainability.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jepsen workload, elastickv-zset-safety-workload, designed to verify safety properties of the Redis ZSet implementation, including score correctness, ordering, and atomicity. It includes a custom model-based checker, a client implementation using Carmine, and comprehensive unit tests for the checker's logic. A review comment suggests catching Throwable instead of Exception in the client's invoke! method to improve robustness against unexpected errors.

"WITHSCORES"))]
(assoc op :type :ok :value {:bounds [lo hi]
:members (parse-withscores flat)})))
(catch Exception e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Jepsen client implementations, it is generally safer to catch Throwable rather than just Exception in the invoke! loop. This ensures that any unexpected errors (such as java.lang.Error subclasses like NoClassDefFoundError) are caught, logged, and reported as :info operations. This prevents the worker thread from crashing, which can cause the Jepsen run to hang or fail ungracefully.

Suggested change
(catch Exception e
(catch Throwable e

bootjp added 2 commits April 23, 2026 20:21
Some Carmine versions / RESP3 codepaths surface ZREM's reply as a
numeric string ("1") or raw bytes rather than a Long. The previous
(long (or removed 0)) guard handled nil but still threw
ClassCastException on string/bytes, falling through to the general
Exception handler and masking the real signal.

Add coerce-zrem-count helper (parallel to coerce-zincrby-score) that
accepts Number / String / bytes / nil / other and returns a
non-negative long, with unparseable values treated as 0 so the op
stays :ok.

Covers regression with tests exercising string "1", string "0", bytes,
and unexpected keyword replies.
:missing-member-range previously reported :expected-score (first
scores), which is misleading when concurrent writers leave multiple
admissible scores -- picking an arbitrary element hides the other
valid linearizations from consumers reading the error map.

Add :allowed scores to the error map (matching the sibling
:score-mismatch-range convention). :expected-score is retained for
backward compatibility but is only populated when the admissible set
has exactly one element; with >1 admissible scores it is nil, forcing
consumers to look at :allowed instead of silently reading a
half-truth.

Covers the new error shape with tests for both the multi-score
(several concurrent writers) and single-score (sanity / backcompat)
cases.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Addressed both gemini medium items.

  • 1a9370f4 fix(jepsen-zset): coerce ZREM count across Long / string / bytes
    Added coerce-zrem-count helper (parallel to coerce-zincrby-score) so the :zrem invoke path no longer ClassCastExceptions on string / bytes replies from RESP3 or Carmine variants. Nil still maps to 0 (removed? false); unparseable values also map to 0 rather than throwing, keeping the op :ok. Regression tests cover string "1", string "0", raw bytes, and unexpected keyword replies.

  • ad9079c6 fix(jepsen-zset): include full :allowed set in :missing-member-range
    :missing-member-range now carries :allowed scores (matching the sibling :score-mismatch-range convention) instead of only :expected-score (first scores). :expected-score is retained for backcompat but is only populated when exactly one score is admissible; otherwise it is nil so consumers cannot silently read a misleading half-truth. New tests cover both the multi-admissible-score case (two concurrent ZADDs) and the single-score backcompat case.

Local JAVA_HOME=/opt/homebrew/opt/openjdk@21 lein test elastickv.redis-zset-safety-workload-test: 47 tests, 83 assertions, 0 failures, 0 errors.

/gemini review
@codex review

…rkers

Addresses gemini medium on 559e83d.

The `invoke!` `(catch Exception e)` won't catch `java.lang.Error`
subclasses (NoClassDefFoundError, OutOfMemoryError, LinkageError, ...).
In Jepsen those normally propagate, crash the worker thread, and can
hang the run or leave history gaps that mask the underlying failure.

Widen to `(catch Throwable t)`, matching the pattern already used one
call site up in zincrby's coerce branch at line 224. The op is still
recorded as :info with a descriptive :error field, so the history
shows exactly which operation hit the Error and why.

Tests: 47 tests / 83 assertions, 0 failures.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Addressed gemini medium (catch Throwable in invoke!) in 69db24e.

The (catch Exception ...) at line 290 wouldn't catch java.lang.Error subclasses (NoClassDefFoundError, OutOfMemoryError, LinkageError, ...). In a Jepsen workload those normally propagate out, crash the worker thread, and can hang the run or leave history gaps that mask the underlying failure.

Widened to (catch Throwable t), matching the pattern already used at line 224 in the zincrby coerce branch. The op is still recorded as :info with a descriptive :error field.

(catch Throwable t
  (warn t (str "ZSet safety op failed: " (:f op)))
  (assoc op :type :info :error (or (.getMessage ^Throwable t) (str t))))

Tests: 47 tests / 83 assertions, 0 failures.

/gemini review
@codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jepsen workload, elastickv-zset-safety-workload, designed to verify safety properties of Redis ZSet implementations under faults. The workload includes a custom model-based checker that validates score correctness, ordering, range queries, and atomicity. Additionally, the PR updates the main Jepsen test entry point to support workload dispatching and adds comprehensive unit tests for the new workload's client and checker logic. I have no feedback to provide as there are no review comments.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jepsen workload, redis-zset-safety-workload, designed to verify advanced safety properties of Redis ZSet implementations, such as score correctness, ordering, and atomicity under faults. It includes a custom model-based checker and comprehensive unit tests for the workload's logic. Feedback was provided regarding the handling of Redis bulk strings returned as byte arrays in parse-double-safe and coerce-zincrby-score to ensure correct parsing of scores.

Comment on lines +77 to +78
(let [raw (str s)
lower (str/lower-case raw)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Clojure, calling (str s) on a byte array returns the object's string representation (e.g., "[B@..."), which is not the intended behavior when parsing numeric scores from Redis. Since Redis scores can be returned as bulk strings (which Carmine may surface as byte arrays depending on the RESP version or configuration), they should be explicitly converted to a UTF-8 string before parsing.

Suggested change
(let [raw (str s)
lower (str/lower-case raw)]
(let [raw (if (bytes? s) (String. ^bytes s "UTF-8") (str s))
lower (str/lower-case raw)]

(number? response)
[:ok (double response)]

(string? response)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Carmine may return the result of a ZINCRBY operation as a byte array (bulk string). The current implementation only handles string? and number?, which causes byte array responses to fall through to the :else case and be treated as :unexpected (resulting in an :info operation type). Adding bytes? support ensures these operations are correctly parsed and recorded as :ok.

Suggested change
(string? response)
(or (string? response) (bytes? response))

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d03672e3d4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +612 to +614
any-candidate-zrem? (some #(= :zrem (:f %)) candidates)
any-uncertain-write? (some write-op? uncertain)
any-uncertain-zrem? (some #(= :zrem (:f %)) uncertain)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Ignore no-op ZREM when deriving possible absence

any-candidate-zrem?/any-uncertain-zrem? currently classify every :zrem as a possible deletion, even when the op completed :ok with removed? = false (a confirmed no-op). That relaxes must-be-present? incorrectly and can let impossible empty reads pass: e.g., overlapping ZADD(m1) and ZREM(m1)->0 both complete before a read, and the checker accepts a read that omits m1 even though ZREM’s return value constrains any valid linearization to end with m1 present. This creates false negatives in the safety checker.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29e62ca83f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +561 to +564
pre-read-info (->> muts
(filter #(and (= :info (:type %))
(some? (:complete-idx %))
(< (:complete-idx %) read-inv-idx))))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude superseded pre-read :info ops from uncertainty

Treating every pre-read :info mutation as uncertain here causes false negatives when a later committed write to the same member exists before the read. For example, an :info :zincrby that completed before a subsequent :ok :zadd is still added to uncertain, which can set :unknown-score? (or admit stale scores) and let impossible read values pass even though the later committed write should deterministically overwrite any earlier effect. The same pattern can also relax required-presence checks via stale :info :zrem entries.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants